Recognition of Handwritten Historical Documents: HMM-Adaptation vs. Writer Specific Training
نویسندگان
چکیده
In this paper we propose a recognition system for handwritten manuscripts by writers of the 20th century. The proposed system first applies some preprocessing steps to remove background noise. Next the pages are segmented into individual text lines. After normalization a hidden Markov model based recognizer, supported by a language model, is applied to each text line. In our experiments we investigate two approaches for training the recognition system. The first approach consists in training the recognizer directly from scratch, while the second adapts it from a recognizer previously trained on a large general off-line handwriting database. The second approach is unconventional in the sense that the language of the texts used for training is different from that used for testing. In our experiments with several training sets of increasing size we found that the overall best strategy is adapting the previously trained recognizer on a writer specific data set of medium size. The final word recognition accuracy obtained with this training strategy is about
منابع مشابه
RWTH OCR: A Large Vocabulary Optical Character Recognition System for Arabic Scripts
We present a novel large vocabulary OCR system, which implements a 5 confidenceand margin-based discriminative training approach for model adap6 tation of an HMM based recognition system to handle multiple fonts, different 7 handwriting styles, and their variations. Most current HMM approaches are HTK 8 based systems which are maximum-likelihood (ML) trained and which try to adapt 9 their model...
متن کاملAllograph Based Writer Adaptation for Handwritten Character Recognition
Writer adaptation is the process of converting a generic (writer-independent) handwriting recognizer into a personalized (writer-dependent) recognizer with improved accuracy for a particular user. While training the generic recognizer uses large amounts of data from several writers, the adaptation process uses only a few samples from a single user. In this paper we present a) an automatic appro...
متن کاملBeyond OCR: Multi-faceted understanding of handwritten document characteristics
In the previous chapters, we proposed several features for writer identification, historical manuscript dating and localization separately. In this chapter, we present a summarization of the proposed features for different applications by proposing a joint feature distribution (JFD) principle to design novel discriminative features which could be the joint distribution of features on adjacent p...
متن کاملWriter adaptation techniques in HMM based Off-Line Cursive Script Recognition
This work presents the application of HMM adaptation techniques to the problem of Off-Line Cursive Script Recognition. Rather than training a new model for each writer, one first creates a unique model with a mixed database and then adapts it for each different writer using his own small dataset. Experiments on a publicly available benchmark database show that an adapted system has an accuracy ...
متن کاملOff-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کامل